Combining Light-Weight Retrieval Strategies for Robust Text Categorization

نویسنده

  • Patrick Ruch
چکیده

We report on the development of a general purpose text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely dataindependent and so it can be used when training data are not available provided that a small set of instances is available for tuning the system. Like it is usual with information retrieval engines, the tool provides a ranked list of categories, which can then be interactively filtered by the user.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Preparation in Text Categorization

Text categorization is an important application of machine learning to the field of document information retrieval. Most machine learning methods treat text documents as a feature vectors. We report text categorization accuracy for different types of features and different types of feature weights. The comparison of these classifiers shows that stemmed or un-stemmed single words as features giv...

متن کامل

Integrating a Structured-Text Retrieval System with an Object-Oriented Database System

We describe the integration of a structured-text retrieval system (TextMachine) into an object-oriented database system (OpenODB). Our approach is a light-weight one, using the external function capability of the database system to encapsulate the text retrieval system as an external information source. Yet, we are able to provide a tight integration in the query language and processing; the us...

متن کامل

Learning-Free Text Categorization

In this paper, we report on the fusion of simple retrieval strategies with thesaural resources in order to perform large-scale text categorization tasks. Unlike most related systems, which rely on training data in order to infer text-to-concept relationships, our approach can be applied with any controlled vocabulary and does not use any training data. The first classification module uses a tra...

متن کامل

Combining image content and annotated text for medical image categorization and retrieval

The richness of health-information available on-line requires the development of efficient information retrieval methods. The CISMeF heath-catalogue provides indexing and searching capabilities for healthresources. Medical images are representing a significant part of on-line medical knowledge and a valuable component of diagnosis and teaching. In this context, a combined text and image extract...

متن کامل

An Improved Algorithm of Bayesian Text Categorization

Text categorization is a fundamental methodology of text mining and a hot topic of the research of data mining and web mining in recent years. It plays an important role in building traditional information retrieval, web indexing architecture, Web information retrieval, and so on. This paper presents an improved algorithm of text categorization that combines the feature weighting technique with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005